Locality Sensitive Hashing Based Clustering

نویسندگان

  • Xin Jin
  • Jiawei Han
چکیده

Definition In learning systems with kernels, the shape and size of a kernel plays a critical role for accuracy and generalization. Most kernels have a distance metric parameter, which determines the size and shape of the kernel in the sense of a Mahalanobis distance. Advanced kernel learning tune every kernel’s distance metric individually, instead of turning one global distance metric for all kernels.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Kernelized Locality-Sensitive Hashing for Semi-Supervised Agglomerative Clustering

Large scale agglomerative clustering is hindered by computational burdens. We propose a novel scheme where exact inter-instance distance calculation is replaced by the Hamming distance between Kernelized Locality-Sensitive Hashing (KLSH) hashed values. This results in a method that drastically decreases computation time. Additionally, we take advantage of certain labeled data points via distanc...

متن کامل

Efficient Clustering of Metagenomic Sequences using Locality Sensitive Hashing

The new generation of genomic technologies have allowed researchers to determine the collective DNA of organisms (e.g., microbes) co-existing as communities across the ecosystem (e.g., within the human host). There is a need for the computational approaches to analyze and annotate the large volumes of available sequence data from such microbial communities (metagenomes). In this paper, we devel...

متن کامل

OPI-JSA at CLEF 2017: Author Clustering and Style Breach Detection

In this paper, we propose methods for author identification task dividing into author clustering and style breach detection. Our solution to the first problem consists of locality-sensitive hashing based clustering of real-valued vectors, which are mixtures of stylometric features and bag of n-grams. For the second problem, we propose a statistical approach based on some different tf-idf featur...

متن کامل

A Content-based Music Similarity Retrieval Scheme by Using BoW Representation and LSH-based Retrieval

This extended abstract paper presents detailed information about a content-based music similarity retrieval scheme, which is based on locality sensitive hashing (LSH). Our scheme considered MFCC and time histogram (TH) as two major features to represent the properties of audio music similarity. Next, each feature is depicted by Bag of Words (BoW), which k-means clustering summarizes extracted f...

متن کامل

Scalable Techniques for Clustering the Web

Clustering is one of the most crucial techniques for dealing with the massive amount of information present on the web. Clustering can either be performed once offline, independent of search queries, or performed online on the results of search queries. Our offline approach aims to efficiently cluster similar pages on the web, using the technique of Locality-Sensitive Hashing (LSH), in which we...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010